Active Learning and Feature Selection in the Drug Discovery Process
نویسنده
چکیده
Non-technical: In collaboration with the computational chemists at Telik, we will develop and apply novel approaches of Machine Learning to the characterization and classification of organic molecules with respect to their potential as pharmaceutical agents. In preliminary research we have already shown that our methods greatly improve the efficiency of the drug discovery cycle. In particular, we will develop search methods that identify small sets of chemical features of the compounds that are likely to be responsible for the relevant pharmaceutical properties. Technical: We propose to use modern Machine Learning techniques to help speed up the drug discovery cycle. Candidate compounds are represented as high-dimensional descriptor vectors. The algorithms are to decide which batch of compounds should be tested next and which features are responsible for the activity of the compounds. We use the maximum margin hyperplane separating the labeled compounds for selecting the next batch of unlabeled compounds. An alternate method based on the Voted Perceptron is more suitable for high-dimensional data. We also determine small sets of relevant features using the Maximum Entropy principle. ∗Computer Science Dept., University of California, Santa Cruz, CA 94065, USA
منابع مشابه
Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...
متن کاملSequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...
متن کاملBridging the semantic gap for software effort estimation by hierarchical feature selection techniques
Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...
متن کاملA Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)
Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کامل